feat: PG store config override + cross-replica auth sync#2495
feat: PG store config override + cross-replica auth sync#2495xandr0s wants to merge 2 commits intorouter-for-me:mainfrom
Conversation
Two features for Kubernetes multi-replica deployments with PG store: 1. Config override via PGSTORE_CONFIG_OVERRIDE_PATH env var: When set, the file at that path is read on bootstrap and written into PG store, overriding any previously stored config. This lets k8s ConfigMaps take precedence over PG-stored config on every pod restart. 2. Auth sync polling via StartAuthSync(): Periodically polls auth_store table and incrementally updates local auth files. Only writes files whose content has actually changed, avoiding unnecessary fsnotify events. Uses mutex to prevent races with concurrent file access. Cleans up local files removed from PG. Both features are opt-in and fully backward compatible.
There was a problem hiding this comment.
Code Review
This pull request introduces support for several new login providers, including Kiro, GitLab, Cursor, and GitHub Copilot, along with an incognito mode for browser-based authentication. It also implements a background synchronization mechanism in the Postgres store to support multi-replica credential sharing and adds an environment variable for overriding database configuration with local files. Feedback highlights a redundant call to the synchronization loop, a regression in header processing, and logic errors in the file cleanup process that could lead to orphaned files or incorrect deletions.
cmd/server/main.go
Outdated
| pgStoreInst.StartAuthSync(context.Background(), 30*time.Second, nil) | ||
| pgStoreInst.StartAuthSync(context.Background(), 30*time.Second, nil) |
There was a problem hiding this comment.
The StartAuthSync method is called twice consecutively with the same parameters. This starts two redundant background synchronization loops, which is inefficient and could lead to unnecessary resource contention on the store's internal mutex.
| pgStoreInst.StartAuthSync(context.Background(), 30*time.Second, nil) | |
| pgStoreInst.StartAuthSync(context.Background(), 30*time.Second, nil) | |
| pgStoreInst.StartAuthSync(context.Background(), 30*time.Second, nil) |
| NextRefreshAfter: time.Time{}, | ||
| } | ||
| cliproxyauth.ApplyCustomHeadersFromMetadata(auth) | ||
| auths = append(auths, auth) |
There was a problem hiding this comment.
The call to cliproxyauth.ApplyCustomHeadersFromMetadata(auth) was removed. This appears to be a regression, as it prevents custom headers defined in the metadata from being applied to the authentication object. This will likely break functionality for providers that rely on these headers.
cliproxyauth.ApplyCustomHeadersFromMetadata(auth)
auths = append(auths, auth)
internal/store/postgresstore.go
Outdated
| entries, _ := os.ReadDir(s.authDir) | ||
| for _, e := range entries { | ||
| if e.IsDir() || seen[e.Name()] { | ||
| continue | ||
| } | ||
| os.Remove(filepath.Join(s.authDir, e.Name())) | ||
| changed = true | ||
| } |
There was a problem hiding this comment.
The cleanup logic for local auth files is flawed:
- Not recursive: It skips directories (
e.IsDir()), so files in subdirectories (which are created on line 201) will never be deleted if their database records are removed. - Incorrect key matching: It checks
seen[e.Name()], butseenis keyed by the fullid(e.g.,provider/user.json), whilee.Name()is only the base filename (e.g.,provider). This will cause incorrect deletions or failures to delete. - Error handling: Errors from
os.ReadDirandos.Removeare ignored.
Consider using filepath.WalkDir to correctly identify and remove orphaned files across the entire directory structure.
internal/store/postgresstore.go
Outdated
| ticker := time.NewTicker(interval) | ||
| defer ticker.Stop() | ||
| for { | ||
| select { | ||
| case <-ctx.Done(): | ||
| return | ||
| case <-ticker.C: | ||
| changed, err := s.incrementalAuthSync(ctx) | ||
| if err != nil { | ||
| log.Printf("postgres store: auth sync poll: %v", err) | ||
| continue | ||
| } | ||
| if changed && onChange != nil { | ||
| onChange() | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Using time.NewTicker in a background goroutine can cause synchronization tasks to pile up if incrementalAuthSync takes longer than the interval. Although the store uses a mutex, multiple goroutines could end up blocked waiting for the lock. A more robust approach is to use a loop with time.Sleep or time.After to ensure a fixed delay between the completion of one sync and the start of the next.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7e4c1e50ce
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| NextRefreshAfter: time.Time{}, | ||
| } | ||
| cliproxyauth.ApplyCustomHeadersFromMetadata(auth) | ||
| auths = append(auths, auth) |
There was a problem hiding this comment.
Restore custom header hydration for Postgres auth records
Re-add cliproxyauth.ApplyCustomHeadersFromMetadata(auth) before appending each auth in List. This commit removed that call, so PG-backed auth entries no longer populate Attributes["header:..."] from metadata, unlike the git/object stores; any provider that depends on custom headers will silently lose them after load/restart and requests can fail or be routed incorrectly.
Useful? React with 👍 / 👎.
cmd/server/main.go
Outdated
| cfg.AuthDir = pgStoreInst.AuthDir() | ||
| log.Infof("postgres-backed token store enabled, workspace path: %s", pgStoreInst.WorkDir()) | ||
| pgStoreInst.StartAuthSync(context.Background(), 30*time.Second, nil) | ||
| pgStoreInst.StartAuthSync(context.Background(), 30*time.Second, nil) |
There was a problem hiding this comment.
Remove duplicate auth sync poller startup
The second StartAuthSync call starts an extra goroutine with the same 30s interval, so each instance polls and syncs auth data twice. In production PG deployments this doubles steady-state database load and file-sync churn for no functional gain.
Useful? React with 👍 / 👎.
…s, fix cleanup - Remove duplicate StartAuthSync call in main.go - Restore cliproxyauth.ApplyCustomHeadersFromMetadata(auth) in List() - Fix file cleanup: use filepath.WalkDir for recursive traversal, match by relative path (not just filename) - Replace time.Ticker with time.After to prevent sync pileup
672e058 to
dc88a48
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dc88a4803c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if err == nil { | ||
| cfg.AuthDir = pgStoreInst.AuthDir() | ||
| log.Infof("postgres-backed token store enabled, workspace path: %s", pgStoreInst.WorkDir()) | ||
| pgStoreInst.StartAuthSync(context.Background(), 30*time.Second, nil) |
There was a problem hiding this comment.
Use a bounded context for auth sync polling
StartAuthSync is started with context.Background(), but incrementalAuthSync takes the store mutex before QueryContext; if the DB call stalls (for example during a network partition), the poll can block indefinitely while holding s.mu, which also blocks Save, Delete, and PersistAuthFiles on that replica. Please pass a cancellable context with a per-poll timeout instead of an unbounded background context.
Useful? React with 👍 / 👎.
| if err := os.WriteFile(path, []byte(payload), 0o600); err != nil { | ||
| return false, fmt.Errorf("write auth file: %w", err) |
There was a problem hiding this comment.
Prevent poller from clobbering pending local auth edits
This overwrite path applies whenever file bytes differ from the DB payload, so in PG mode a direct auth-file edit can be lost if the poll runs before watcher-driven PersistAuthFiles finishes (that persistence is asynchronous): the poll writes stale DB content back to disk, and the stale version is then re-persisted. That makes file-based auth updates non-deterministic under concurrent polling.
Useful? React with 👍 / 👎.
|
Thanks for the thorough review! Regarding the P2 concern about poll clobbering local edits: In our deployment, auth credentials are always added via management UI or CLI login — both write to PG store first, then mirror to disk. The poll only syncs PG → local, never the reverse direction. So the race between poll and local file edits doesn't apply to our use case. That said, adding an |
xkonjin
left a comment
There was a problem hiding this comment.
Code Review: PG Store Config Override + Cross-Replica Auth Sync
Summary
Well-architected feature for multi-replica deployments. The auth sync mechanism enables horizontal scaling without sticky sessions, and the config override provides necessary flexibility for k8s environments.
Strengths
- Smart incremental sync — only writes changed files, avoiding fsnotify storms
- Atomic file operations — proper temp file + rename pattern in incrementalAuthSync
- Config override is well-designed — PGSTORE_CONFIG_OVERRIDE_PATH provides clean k8s integration
- Orphan cleanup — removes local auth files not present in DB, preventing stale credential accumulation
Issues
1. Potential race condition in auth sync startup
In main.go, StartAuthSync is called immediately after pgStoreInst creation, but there's a small window where the bootstrap may not have completed. Consider starting sync after Bootstrap() returns.
2. Error handling in orphan cleanup is silent
If removal fails (permissions, filesystem issues), it's silently ignored. Consider logging at WARN level.
3. No backoff for sync failures
The sync loop logs errors but retries immediately on the next interval. If PG is temporarily unavailable, this could spam logs. Consider adding a simple backoff or at least a counter.
4. Missing test coverage
- No tests for incrementalAuthSync logic
- No tests for config override fallback path (when override file read fails)
- No tests for orphan cleanup behavior
Security
- Proper file permissions (0o600 for files, 0o700 for dirs)
- SQL queries use parameterized queries implicitly via QueryContext
- Consider adding authentication/authorization on the auth_store table access if multiple untrusted pods share the PG instance
Performance
The 30-second poll interval is reasonable for auth propagation, but consider making it configurable for latency-sensitive environments
Verdict
Good foundational work for multi-replica support. The suggestions are improvements, not blockers. Recommend adding tests before merge if time permits.
luispater
left a comment
There was a problem hiding this comment.
Summary:
Adds two Postgres-store features aimed at multi-replica/Kubernetes deployments: (1) PGSTORE_CONFIG_OVERRIDE_PATH to force an on-disk config to win on bootstrap, and (2) auth sync polling to mirror auth_store changes into local auth files.
Key findings:
- Config override is applied early during bootstrap and keeps the DB + local spool consistent.
- Auth sync is incremental (content-compare before write) and serialized to reduce race risk and unnecessary fsnotify churn.
- The poll loop uses
time.After(interval)to avoid ticker pileups if a sync iteration runs long.
Non-blocking suggestions:
- In the cleanup pass, consider normalizing
relwithfilepath.ToSlash(rel)before checking againstseento avoid OS-separator mismatches. - Consider starting
StartAuthSync()with a cancellable lifecycle context instead ofcontext.Background()for graceful shutdown (optional). - Consider guarding/documenting
interval > 0to avoid a tight loop if misconfigured (optional).
Test plan:
- go test ./...
This is an automated Codex review result and still requires manual verification by a human reviewer.
Summary
Two features for Kubernetes multi-replica deployments with PostgreSQL store:
1. Config override via PGSTORE_CONFIG_OVERRIDE_PATH
When this env var is set, the file at that path is read on bootstrap and written into PG store, overriding any previously stored config. This lets k8s ConfigMaps take precedence over PG-stored config on every pod restart.
Problem: PG store seeds config from file on first run, then always reads from DB — ignoring ConfigMap updates on subsequent deploys.
2. Auth sync polling via StartAuthSync()
Periodically polls auth_store table and incrementally updates local auth files. When one replica adds/refreshes OAuth credentials, other replicas pick them up within 30s.
Implementation details:
Changes
Both features are opt-in and fully backward compatible.
Test plan